Variants of Convolution in Deep Learning

Depthwise convolution: do not sum over all the channels. So when the number of input channels is $n_{in}$ and the number of filters is $n_{filter}$, the number of output channels is $n_{in}\times n_{filter}$.
Pointwise convolution: pointwise fully connected across all the channels.
Group convolution: divide channels into several groups and perform pointwise convolution within each group. Note that Pointwise convolution is the special case of group convolution when there is only one group. Group convolution is used in ShuffleNet.
Depthwise separable convolution = depthwise convolution + pointwise convolution
Dilation convolution or atrous convolution: increase the receptive field without increasing the number of parameters, typically used for segmentation.
Deformable convolution (left) and spatial transformation network (right): these two methods both belong to irregular convolution and tweak the coordinates on the input feature map. Deformable convolution learn sthe offset while spatial transformation network learns the affine transformation.
Squeeze-and-Excitation: learn different weights for each channel.